Method for generating, transmitting and receiving stereoscopic images and relating devices

ABSTRACT

A method for generating a stereoscopic video stream ( 101 ) having composite images (C) that include information about a right image (R) and a left image (L), as well as at least one depth map includes pixels from the right image (R) and from the left image (L), and then entering the selected pixels into a composite image (C) of the stereoscopic video stream. The method also provides for entering all the pixels of the right image (R) and all the pixels of the left image (L) into the composite image (C) by leaving one of said two images unchanged and breaking up the other one into regions (R 1 , R 2 , R 3 ) having a plurality of pixels. The pixels of the depth map(s) are then entered into that region of the composite image which is not occupied by pixels of the right and left images.

FIELD OF THE INVENTION

The present invention concerns the generation, storage, transmission,reception and reproduction of stereoscopic video streams, i.e. videostreams which, when appropriately processed in a visualization device,produce sequences of images which are perceived as beingthree-dimensional by a viewer.

BACKGROUND ART

As known, the perception of three-dimensionality can be obtained byreproducing two images, one for the viewer's right eye and the other forthe viewer's left eye.

A stereoscopic video stream therefore transports information about twosequences of images, corresponding to the right and left perspectives ofan object or a scene. Such a stream can also transport supplementaryinformation.

International patent application PCT/IB2010/055918, published on 30 Jun.2011 as WO2011/077343A1, describes a left/right image multiplexingmethod and a demultiplexing method (as well as related devices) whichallow to preserve the balance between horizontal and verticalresolution, thus offering advantages over known techniques such as “sideby side” and “top and bottom”.

According to said multiplexing method, the pixels of the first image(e.g. the left image) are entered into the composite image unchanged,whereas the second image is divided into regions whose pixels arearranged in free areas of the composite image, as shown in FIG. 1, whichshows the case wherein two so-called 720p images are entered into acontainer frame 1080p.

In reception, the image divided into regions is reconstructed and thensent to the display. For examples, displays are known which operate inaccordance with the so-called “frame alternate” principle, i.e. showingthe two images L and R in temporal succession. For stereoscopic vision,so-called “active” glasses must be worn, i.e. glasses which,synchronized with the succession of images L and R, shade one lens andkeep the lens of the other eye open, so that each eye can only see theimage intended for it.

It is known that stereoscopic vision through such displays can proveannoying for some viewers, to whom it would be desirable to offer thepossibility of varying (decreasing) the depth of the images so as toadapt it to their subjective preferences and to the size of the screen.To do so, it is necessary to provide, within the display, a synthesis ofintermediate images between those being transmitted, which will then bedisplayed in the place of the actually transmitted images. Such areconstruction can be done, by using known techniques, if one or moredepth maps associated with the transmitted images are available.

Furthermore, so-called self-stereoscopic displays have recently begun toappear on the market, which do not require the use of glasses. Also suchdisplays carry out a synthesis of non-transmitted images, and thereforerequire at least one depth map providing the information necessary forsuch synthesis.

It has thus become necessary to introduce a new format for generating,transporting and reconstructing stereoscopic streams, which format canbe used for traditional 2D reception and reproduction devices and forcurrent two-view stereoscopic 3D reception and reproduction devices(with or without depth adjustment), as well as for futureself-stereoscopic devices using more than two views, while at the sametime preserving the utmost compatibility of the format with the videostream production and distribution infrastructures and devices currentlyin use.

BRIEF DESCRIPTION OF THE INVENTION

It is therefore the object of the present invention to propose a methodfor generating, transmitting and receiving stereoscopic images, andrelated devices, aimed at fulfilling the above-described requirements.

The invention relates to a method and a device for multiplexing the twoimages relating to the right and left perspectives (hereafter referredto as right image and left image), as well as one or more depth maps,within a single composite frame.

The invention also relates to a method and a device for demultiplexingsaid composite image, i.e. for extracting therefrom the right and leftimages and the depth map(s) entered by the multiplexing device.

As can be seen in FIG. 1 a, pertaining to the above-mentionedinternational patent application (the so-called “tile format”), in thecomposite image there is an unused region (C5) whose dimensions arehalf, both horizontally and vertically, of those of the two images L andR. According to one possible embodiment of the invention, at least onedepth map (DM) can be entered into said unused region, as shown in FIG.1 b.

A depth map relating to an image x is to be understood as a grayscaleimage wherein each pixel has a luminance value which is proportional tothe depth, i.e. the coordinate “z”, of the pixel itself, where byconvention it is assumed that the value z=0 corresponds to the positionon the screen, and positive values of z correspond to pixels positionedbehind the screen, while negative values correspond to pixels positionedin front of the screen. Since the unused region of the composite imagehas horizontal and vertical dimensions which are half the dimensions ofthe images L and R, in one embodiment of the present invention it ispossible to enter into such region a depth map (relating to one of thetwo images L and R) having horizontal and vertical resolution equal tohalf the corresponding image. It has been observed that such a loss ofresolution is not detrimental because, given the inaccuracy with whichdepth maps can generally be calculated or measured, it is preferable tosubject full-resolution maps to undersampling operations by makinginterpolations between the pixel values, in that such operations canreduce the noise component, resulting in reconstructed images of higherquality.

According to other embodiments of the invention, it is possible to entertwo depth maps into said unused region (C5).

The above-mentioned international patent application also describesother forms of multiplexing and demultiplexing of the stereoscopicimages L and R, to which the method of the present invention can beapplied as well, although less effectively because the space leftavailable for entering the depth map is smaller. Consequently, therewill be a further reduction of the resolution of said map. While stillfalling within the general principles of the present invention, suchalternative implementations will not be described herein.

It is a particular object of the present invention to provide a methodfor generating, transmitting and receiving stereoscopic images, andrelated devices, as set out in the appended claims, which are anintegral part of the present description.

BRIEF DESCRIPTION OF THE DRAWINGS

Further objects and advantages of the present invention will become moreapparent from the following description of a few embodiments thereof,which are supplied by way of non-limiting example with reference to theannexed drawings, wherein:

FIG. 1 a shows the composite frame in the prior-art format (tileformat);

FIG. 1 b shows one example of a composite frame according to the presentinvention;

FIG. 2 shows a block diagram of a device for multiplexing the rightimage, the left image and a depth map into a composite image;

FIG. 3 is a flow chart of a method executed by the device of FIG. 2;

FIG. 4 shows one possible form of disassembly of an image to be enteredinto a composite image;

FIG. 5 shows a block diagram of a device for extracting the left image,the right image and a depth map from the composite frame;

FIG. 6 is a flow chart of a method executed by the device of FIG. 5.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 2 shows a block diagram of a device 100 for generating astereoscopic video stream 101 with at least one depth map, in accordancewith the variants of the invention. In FIG. 2, the device 100 receivestwo sequences of images 102 and 103, e.g. two video streams respectivelyintended for the left eye (L) and for the right eye (R), as well as asequence of depth maps 106 relating to the three-dimensional contentassociated with the stereoscopic video stream.

The depth map of the sequence 106 may be associated with one of the tworight and left images respectively belonging to the sequences 102 and103, or it may be created as an interpolation between the depth maps forthe right and left images, i.e. relating to an intermediate viewpoint ofthe scene.

In this first embodiment, which will be described below, the depth mapis generated through any one of the algorithms already known in the art,which are based, for example, on a comparison between a right image anda left image, and which return a matrix (i.e. the depth map), the sizeof which is equal to the pixels of one of the two compared images, andthe elements of which have a value which is proportional to the depth ofeach pixel of said image. Another depth map generation technique isbased on measuring the distance of the object in the scene from the pairof video cameras that are shooting the scene: this distance can beeasily measured by means of a laser. In the case of artificial videostreams generated with the help of electronic computers, the videocameras are virtual ones, in that they consist of two points of view ofa certain scene artificially created by a computer. In such a case, thedepth maps are generated by the computer and are very accurate.

As an alternative to the example of FIG. 2, the depth maps of thesequence 106 may be generated within the device 100. In this case, thedevice 100, instead of receiving the sequence of depth maps from theoutside, comprises a suitable module (not shown in the drawing) which isinputted the images L and R of the sequences 102 and 103 and thencalculates the corresponding depth maps.

The device 100 allows to implement a method for multiplexing two imagesof the two sequences 102 and 103 and the depth map of the sequence 106.

In order to implement the method for multiplexing the right and leftimages and the depth map, the device 100 comprises a disassembler module104 for breaking up an input image (the right image in the example ofFIG. 1 b) into a plurality of subimages, each corresponding to oneregion of the received image, an undersampling and filtering module 107for processing the depth map, and an assembler module 105 capable ofentering the pixels of received images, including the depth map, into asingle composite image to be provided at its output. If no processing ofthe sequence 106 is necessary, the module 107 may be omitted. This maybe the case, for example, when the depth map is laser-generated and has,right from the start, a lower resolution than that of the images L andR.

One example of a multiplexing method implemented by the device 100 willnow be described with reference to FIG. 3.

The method starts in step 200. Subsequently (step 201), one of the twoinput images (right or left) is broken up into a plurality of regions,as shown in FIG. 4. In the example of FIG. 4, the disassembled image isa frame R of a 720p video stream, i.e. a progressive format with aresolution of 1280×720 pixels.

The frame R of FIG. 4 comes from the video stream 103 that carries theimages intended for the right eye, and is disassembled into threeregions R1, R2 and R3, preferably rectangular in shape.

The disassembly of the image R is obtained by dividing it into twoportions of the same size and subsequently subdividing one of theseportions into two portions of the same size.

The region R1 has a size of 640×720 pixels and is obtained by taking allthe first 640 pixels of each row. The region R2 has a size of 640×360pixels and is obtained by taking the pixels from 641 to 1280 of thefirst 360 rows. The region R3 has a size of 640×360 pixels and isobtained by taking the remaining pixels of the image R, i.e. the pixelsfrom 641 to 1280 of the last 360 rows.

In the example of FIG. 2, the step of disassembling the image R iscarried out by the module 104, which receives an input image R (in thiscase the frame R) and outputs three subimages (i.e. three groups ofpixels) corresponding to the three regions R1, R2 and R3.

Subsequently (steps 202, 203 and 204) the composite image C isconstructed, which comprises the information pertaining to both theright and left images and to the depth map received; in the exampledescribed herein, said composite image C is a frame of the outputstereoscopic video stream, and therefore it is also referred to ascontainer frame.

First of all (step 202), the input image received by the device 100 andnot disassembled by the device 105 (the left image L in the example ofFIG. 2) is entered unchanged into an undivided area within a containerframe, which is sized in a manner such as to include all the pixels ofboth input images. For example, if the input images have a size of1280×720 pixels, then a container frame suitable for containing bothwill be a frame of 1920×1080 pixels, e.g. a frame of a video stream ofthe 1080p type (progressive format with 1920×1080 pixels).

In the example of FIG. 1, the left image L is entered into the containerframe C and positioned in the upper left corner. This is obtained bycopying the 1280×720 pixels of the image L into an area C1 consisting ofthe first 1280 pixels of the first 720 rows of the container frame C.

In the next step 203, the image disassembled in step 201 by the module104 is entered into the container frame. This is achieved by the module105 by copying the pixels of the disassembled image into the containerframe C in the areas thereof which have not been occupied by the imageL, i.e. areas external to the area C1.

In order to attain the best possible compression and reduce thegeneration of artifacts when decompressing the video stream, the pixelsof the subimages outputted by the module 104 are copied by preservingthe respective spatial relations. In other words, the regions R1, R2 andR3 are copied into respective areas of the frame C without undergoingany deformation, exclusively by means of translation operations.

An example of the container frame C outputted by the module 105 is shownin FIG. 1 b.

The region R1 is copied into the last 640 pixels of the first 720 rows(area C2), i.e. next to the previously copied image L.

The regions R2 and R3 are copied under the area C1, i.e. respectively inthe areas C3 and C4, which respectively comprise the first 640 pixelsand the following 640 pixels of the last 360 rows.

The operations for entering the images L and R into the container framedo not imply any alterations to the balance between horizontal andvertical resolution.

The above-described technique for entering images L and R into thecontainer frame C will hereafter be defined as tile-format type.

In the free pixels of the frame C, i.e. in the area C5, the module 105enters, in the form of an image, the depth map (DM) pertaining to thestereoscopic pair L and R (step 204). Prior to step 204, the depth mapDM may be undersampled, filtered or further processed by the module 107.

The depth map is preferably coded as a grayscale image, the informationcontent of which can therefore be transported by the luminance signalalone; chrominances are not used and may be, for example, null; thisallows to obtain an effective compression of the container frame C.

In a preferred embodiment, the depth map DM has a resolution of 640×360pixels, corresponding to a 4-to-1 undersampling (or decimation) of theoriginal depth map having a resolution of 1280×720 pixels, matching thatof the images L and R. Each pixel of the undersampled map DM correspondsto a 2×2 pixel region of the original map. The undersampling operationis typically carried out by using procedures which are per se known inthe art.

The frame C thus obtained is subsequently compressed and transmitted orsaved to a storage medium (e.g. a DVD). For this purpose, compressionmeans are provided which are adapted to compress an image or a videosignal, along with means for recording and/or transmitting thecompressed image or video signal.

FIG. 5 shows a block diagram of a receiver 1100 which decompresses thereceived container frame (if compressed), reconstructs the two right andleft images, and makes them available to a visualization device (e.g. atelevision set) allowing fruition of 3D contents. The receiver 1100 maybe a set-top box or a receiver built in a television set.

The same remarks made for the receiver 1100 are also applicable to astored image reader (e.g. a DVD reader) which reads a container frame(possibly compressed) and processes it in order to obtain one pair offrames corresponding to the right and left images entered into thecontainer frame (possibly compressed) read by the reader.

Referring back to FIG. 5, the receiver receives (via cable or antenna) acompressed stereoscopic video stream 1101 and decompresses it by meansof a decompression module 1102, thereby obtaining a video streamcomprising a sequence of frames C′ corresponding to the frames C. Incase of an ideal channel or if container frames are being read from amass memory or a data medium (Blu-ray, CD, DVD), the frames C′correspond to the container frames C carrying the information about theright and left images and the depth map, except for any artifactsintroduced by the compression process.

These frames C′ are then supplied to a reconstruction module 1103, whichexecutes an image reconstruction and depth map extraction method asdescribed below with reference to FIG. 6.

It is apparent that, if the video stream is not compressed, thedecompression module 1102 may be omitted and the video signal may besupplied directly to the reconstruction module 1103.

The reconstruction process starts in step 1300, when the decompressedcontainer frame C′ is received.

The reconstruction module 1103 extracts (step 1301) the left image L bycopying the first 720×1080 contiguous pixels of the decompressed frameinto a new frame which is smaller than the container frame, e.g. a frameof a 720p stream. The image L thus reconstructed is sent to the outputof the receiver 1100 (step 1302).

The term “contiguous pixels” refers to pixels of an unchanged imagebelonging to an undivided area of the frame.

Subsequently, the method provides for extracting the right image R fromthe container frame C′.

The step of extracting the right image (see also FIG. 4) begins bycopying (step 1303) the area R1 present in the frame C′. More in detail,the pixels of the 640 columns of R1 are copied into the correspondingfirst 640 columns of the new frame that represents the reconstructedimage Rout. Subsequently, R2 is extracted (step 1304). From thedecompressed frame C′ (which, as aforesaid, corresponds to the frame Cof FIG. 1 b), the pixels of the area C3 (corresponding to the sourceregion R2) are selected. At this point, the 640 columns of pixels arecopied into the free columns adjacent to those just copied from R1.

As far as R3 is concerned (step 1305), the pixels of the region C4 areextracted from the frame C′ and are copied into the last free column inthe lower left corner of the reconstructed frame.

At this point, the right image Rout has been fully reconstructed and canbe outputted (step 1306).

Finally, the reconstruction module 1103 extracts (step 1307) the depthmap by copying into a memory area the luminance values of the last640×320 pixels of the decompressed container frame C′, corresponding tothe area C5. The content of said memory area is outputted to thereceiver 1100 (step 1302) and will be used by the display for generatinginterpolated images not transmitted in the stereoscopic video stream.The process for reconstructing the right and left images and the depthmap contained in the container frame C′ is thus completed (step 1309).Said process is repeated for each frame of the video stream received bythe receiver 1100, so that the output will consist of two video streams1104 and 1105 for the right image and for the left image, respectively,and one video stream 1106 corresponding to the depth map.

The above-described process for reconstructing the right and left imagesand the depth map for image synthesis is based upon the assumption thatthe demultiplexer 1100 knows how the container frame C was built and canthus extract the right and left images and the synthesis depth map.

Of course, this is possible if the multiplexing method is standardized.

In order to take into account the fact that the container frame may begenerated according to any one of the methods that utilize the solutionwhich is the subject of the appended claims, the demultiplexerpreferably uses signaling information contained in the form of metadatain a predefined region of the composite image or in the video stream,which identifies the type of video stream being generated for knowinghow to unpack the content of the composite image and how to reconstructthe right and left images and the depth map for the synthesis ofsupplementary stereoscopic images.

After having decoded the signaling, the demultiplexer will know theposition of the unchanged image (e.g. the left image in theabove-described examples), as well as the positions of the regions intowhich the other image was disassembled (e.g. the right image in theabove-described examples) and the position of the depth map.

With this information, the demultiplexer can extract the unchanged image(e.g. the left image) and the depth map and reconstruct the disassembledimage (e.g. the right image). Although the present invention has beenillustrated so far with reference to some preferred and advantageousembodiments, it is clear that it is not limited to such embodiments andthat many changes may be made thereto by a man skilled in the artwanting to combine into a composite image two images relating to twodifferent perspectives (right and left) of an object or a scene and theassociated depth map.

In a possible variant, for example, instead of entering into thecomposite frame C the depth map relating to one of the two images, aso-called “disparity map” or “displacement map” is entered. Undersuitable hypotheses (shooting with video cameras equipped with identicaloptics), such a map can be easily derived from the depth map, with whichit can be easily related. If the two right and left images are displayedsuperimposed on the same display and glasses are not used to separatethem, one can easily realize that in order to obtain one image from theother it is necessary to move the objects by a certain quantity. Moreprecisely, in order to obtain the right image starting from the leftimage it is necessary to move the objects situated behind the screentowards the right by a quantity that increases with the depth at whichsuch objects are located. The objects which are located exactly on thescreen do not need to be moved, while the objects located in front ofthe screen need to be moved to the left by a quantity that increases asa function of the distance from the screen.

In the previously mentioned conditions, between depth P and disparity Da relation of the following type exists:

D=I*P/(P+P0)

where I is the interocular distance and P0 is the distance of the viewerfrom the screen. It should be noted that, for P tending to infinity, Dwill tend to I, and for P=0 (objects located on the screen) D will beequal to 0.

Of course, in order to reconstruct an intermediate image between theleft and the right image, it is possible to adopt the same proceduredescribed above, but the disparity values will have to be multiplied bya coefficient c between 0 and 1, which is a function of the distance ofthe intermediate viewpoint from the viewpoint of the reference image(the left one in this case).

It should be noted that, when the right image is reconstructed bystarting from the left one in accordance with the above description, orwhen an intermediate image is reconstructed, some areas are leftuncovered, which correspond to the pixels of objects present in theright image but not in the left image, since they are shadowed by otherobjects in front of them (the so-called “occlusions”).

In order to make a complete reconstruction of an intermediate image, itwould therefore be necessary to have available both the right and leftimages as well as both the depth or disparity maps. In this manner, infact, the empty (occluded) areas can be filled by taking thecorresponding pixels from the other image and by moving them by aquantity equal to the relative disparity multiplied by the coefficient1-c.

As can be understood from the above description, another possiblevariant of the invention may require the entry of two depth or disparitymaps, instead of one. Such maps, respectively referring to the leftimage and to the right image, can be entered into the same space where asingle map was entered in the preceding case, by using knownframe-packing techniques such as, for example, side-by-side ortop-and-bottom. In the former case the horizontal resolution of bothmaps is further halved, whereas in the latter case the verticalresolution is halved. It is also possible to use a further variant ofthe frame-packing technique defined above as “tile-format”.

The procedures for entering the two maps on the generation side and forextracting the two maps on the reception side can be easily derived fromthose described with reference to the single-map case, with obviousvariations well known to those skilled in the art.

Of course, the signaling present in the video stream must also be ableto discern the presence of one or two maps. Consequently, said signalingmust contain information adapted to allow distinguishing between atleast two of the following types of composite frames:

-   1) composite frame of the tile-format type without depth or    disparity maps (case of FIG. 1 a);-   2) composite frame of the tile-format type with one depth or    disparity map (case of FIG. 1 b);

and possibly also:

-   3) composite frame of the tile-format type with two depth or    disparity maps in top-and-bottom configuration;-   4) composite frame of the tile-format type with two depth or    disparity maps in side-by-side configuration;-   5) composite frame of the tile-format type with two depth or    disparity maps in tile-format configuration.

The receiver preferably comprises one or more processing blocks adaptedto carry out one or more of the following operations, based on thesignaling information:

-   -   recognizing the type of frame being received, for the purpose of        properly reconstructing the two right and left images of the        three-dimensional video content, as described above;    -   recognizing the presence of one or two depth or disparity maps        and the type of configuration thereof;    -   if there are two depth or disparity maps, obtaining each one of        the two maps;    -   performing, on the depth or disparity maps, operations adapted        to bring the dimensions of the maps to values equal to those of        the images of the video content. These operations may be, for        example, of a type inverse to undersampling, e.g. interpolation        operations.

Other variants may concern the physical implementation of the invention.For example, the electronic modules that implement the above-describeddevices, in particular the device 100 and the receiver 1100, may bevariously subdivided and distributed; furthermore, they may be providedin the form of hardware modules or as software algorithms implemented bya processor, in particular a video processor equipped with suitablememory areas for temporarily storing the input frames received. Thesemodules may therefore execute in parallel or in series one or more ofthe video processing steps of the image multiplexing and demultiplexingmethods according to the present invention. It is also apparent that,although the preferred embodiments refer to multiplexing two 720p videostreams into one 1080p video stream, other formats may be used as well.

It is obvious that the order in which the multiplexing anddemultiplexing procedures shown in FIGS. 3 and 6 are executed is merelyexemplificative: it can be modified for any reason, without changing theessence of the method.

Nor is the invention limited to a particular type of arrangement of thecomposite image, since different solutions for generating the compositeimage may offer specific advantages and/or disadvantages.

The invention, with all its variants, proposes a universal format forgenerating, transporting and reproducing 3D contents on any type ofcurrent or future display.

In the case of a 2D reproduction device, the video processor of thereproduction device will simply discard the images R and the depth maps(DM or DM1 and DM2) that may be present at the output of the receiver1100 and will display, subject to scaling, only the sequence of images Lon an associated visualization device.

The same applies to the case of a 3D reproduction device in which theuser has activated the 2D display mode.

A 3D reproduction device in which the 3D display mode has been activatedmay show two different behaviors, depending on whether the depth of thescene can be adjusted (decreased) or not. In the former case, the videoprocessor will use the two sequences of images L and R to generate thethree-dimensional effect. In the latter case, the video processor willuse the depth maps (one or two) included in the composite frames C′associated with each pair of stereoscopic images R and L to generateintermediate views between L and R, thereby obtaining three-dimensionalimages having a variable depth, lower than that attainable from L and R.

The last case is represented by self-stereoscopic players, which need avery large number of views (a few tens) to generate thethree-dimensional effect for viewers positioned at different points inthe space in front of the display. In this case, the video processorwill use the depth maps (one or two) included in the composite framesC′, along with the images L and R themselves, to synthesize a series ofother images. In front of the display there are a number of lenses orbarriers, such that at any point in space where stereoscopic vision inpossible the viewer will perceive just one pair of said images.

Therefore the video processor of the reproduction device may comprisemeans adapted to send to the display two sequences of images, at leastone of which consists of images synthesized by starting from at leastone of the transmitted views and from at least one depth map. In thiscase, it preferably also comprises means adapted to give the viewer thepossibility of choosing sequences of images relating to more or lessclose viewpoints, so as to vary the perception of depth.

The video processor of the reproduction device may also comprise meansadapted to generate further images corresponding to further views, sothat the viewers positioned at different points in space can seedifferent sequences of images through an associated self-stereoscopicdisplay.

None of the formats proposed until now offers such flexibility andbreadth of use, while at the same time still ensuring a very goodreproduction quality in terms of balance of the horizontal and verticalresolution and of proper resolution assignment to the stereoscopicimages and to the associated depth maps.

The above-described reconstruction operations may take place partly inthe receiver device and partly in the display device.

The present invention can advantageously be at least partly realizedthrough computer programs comprising coding means for implementing oneor more steps of the above-described methods, when such programs areexecuted by a computer. It is therefore understood that the protectionscope extends to said computer programs as well as to computer-readablemeans that comprise recorded messages, said computer-readable meanscomprising program coding means for implementing one or more steps ofthe above-described methods, when said programs are executed by acomputer. The above-described embodiment example may be subject tovariations without departing from the protection scope of the presentinvention, including all equivalent designs known to a man skilled inthe art.

The elements and features shown in the various preferred embodiments maybe combined together without however departing from the protection scopeof the present invention.

From the above description, those skilled in the art will be able toproduce the object of the invention without introducing any furtherimplementation details.

1. A method for generating a stereoscopic video stream comprisingcomposite images, said composite images comprising information about aright image and a left image of a three-dimensional video content,wherein pixels of said right image and pixels of said left image areselected, and said selected pixels are entered into a composite image ofsaid stereoscopic video stream, wherein all the pixels of said rightimage and all the pixels of said left image are entered into saidcomposite image by leaving one of said two images unchanged, breaking upthe other one into a number of regions having a total area equal to thatof said other image, and entering said regions into said compositeimage, wherein said composite image has greater dimensions thannecessary for entering all the pixels of said left image and rightimage, and wherein, in the pixels of the composite image remaining aftersaid entry, at least one depth or disparity map is entered which relatesto the depth or disparity of the pixels of said three-dimensional videocontent, said maps being aimed at reconstructing, in reception, imagesnot being transmitted in said stereoscopic stream.
 2. The methodaccording to claim 1, wherein said at least one depth or disparity mapis coded as a grayscale image.
 3. The method according to claim 2,wherein the video information content of said at least one depth ordisparity map is transported by a single luminance signal, without usingchrominance signals.
 4. The method according to claim 1, wherein said atleast one depth or disparity map has a lower resolution than an originalversion thereof, the resolution of the latter being equal to that ofsaid left image and right image.
 5. The method according to claim 4,wherein said at least one depth or disparity map is obtained bysubjecting said original depth map to 4-to-1 undersampling.
 6. Themethod according to claim 1, wherein said at least one depth ordisparity map is a depth or disparity map associated with one of the tworight or left images, or associated with an intermediate viewpointbetween one of the left images and one of the right images.
 7. Themethod according to claim 1, wherein said at least one depth ordisparity map comprises a depth or disparity map associated with a rightimage and a depth or disparity map associated with a left image.
 8. Themethod according to claim 7, wherein said depth or disparity mapsassociated with a right image and a left image are entered into saidremaining pixels of the composite image by means of frame-packingtechniques.
 9. The method according to claim 1, wherein, if said numberof regions is three, said regions are obtained through the steps of:dividing said other image into two portions having the same horizontaldimension; dividing one of said two portions into two portions havingthe same vertical dimension.
 10. The method according to claim 1,wherein signaling information identifying the type of video streamgenerated is entered as metadata into said composite image or saidstereoscopic video stream.
 11. The method according to claim 10, whereinsaid signaling information is so adapted as to allow distinguishingbetween at least two of the following types of composite frames:composite frame of the tile-format type without depth maps; compositeframe of the tile-format type with one depth map; composite frame of thetile-format type with two depth maps in side-by-side configuration;composite frame of the tile-format type with two depth maps intop-and-bottom configuration; composite frame of the tile-format typewith two depth maps in tile-format configuration.
 12. The device forgenerating a stereoscopic video stream comprising composite images, saidcomposite images comprising information about a right image, a leftimage, characterized in that it comprises means for implementing thesteps of the method according to claim
 1. 13. A method forreconstructing at least one pair of images of a stereoscopic videostream starting from a composite image, said composite image comprisinginformation about a right image, a left image, the method comprising thesteps of: generating a first image of said right and left images bycopying a single group of contiguous pixels from a first region of saidcomposite image, generating the remaining image of said right and leftimages by copying other groups of contiguous pixels from a number ofdistinct regions of said composite image, said number of distinctregions being different from said first region; generating at least onedepth or disparity map by copying at least one group of contiguouspixels from a further region of said composite image, different fromsaid first region and from said number of distinct regions.
 14. Themethod according to claim 13, wherein, if said number of regions isthree: one of said regions of the composite image has the same verticaldimension as said first region and half its horizontal dimension; theremaining two of said regions of the composite image have equalhorizontal and vertical dimensions, and half the vertical dimension ofsaid first region.
 15. The method according to claim 13, wherein said atleast one depth or disparity map is generated by starting from agrayscale image derived from a luminance signal contained in thecontiguous pixels of said further region.
 16. The method according toclaim 15, comprising the step of increasing the horizontal and verticaldimensions of said at least one depth or disparity map up to a dimensionequal to that of said right and left images.
 17. The method according toclaim 13, comprising the step of obtaining, from said composite image orfrom the video stream, signaling information adapted to recognize thetype of video stream being generated.
 18. The method according to claim17, wherein said signaling information is so adapted as to allowdistinguishing between at least two of the following types of compositeframes: composite frame of the tile-format type without depth maps;composite frame of the tile-format type with one depth map; compositeframe of the tile-format type with two depth maps in side-by-sideconfiguration; composite frame of the tile-format type with two depthmaps in top-and-bottom configuration; composite frame of the tile-formattype with two depth maps in tile-format configuration.
 19. A device forreconstructing at least one pair of images of a stereoscopic videostream starting from a composite image, said composite image comprisinginformation about a right image, a left image, the device comprising:means for generating a first image of said right and left images bycopying a single group of contiguous pixels from a first region of saidcomposite image, means for generating the remaining image of said rightand left images by copying other groups of contiguous pixels from anumber of distinct regions of said composite image, said number ofdistinct regions being different from said first region; means forgenerating at least one depth or disparity map by copying at least onegroup of contiguous pixels from a further region of said compositeimage, different from said first region and from said number of distinctregions.
 20. The device according to claim 19, wherein, if said numberof regions is three: one of said regions of the composite image has thesame vertical dimension as said first region and half its horizontaldimension; the remaining two of said regions of the composite image haveequal horizontal and vertical dimensions, and half the verticaldimension of said first region.
 21. The device according to claim 19,wherein said means for generating at least one depth or disparity maputilize a grayscale image derived from a luminance signal contained inthe contiguous pixels of said further region.
 22. The device accordingto claim 21, comprising means for increasing the horizontal and verticaldimensions of said at least one depth or disparity map up to a dimensionequal to that of said right and left images.
 23. The device according toclaim 19, comprising means adapted to recognize the type of video streambeing received based on signaling information identifying said streamtype, contained in said composite image or in said video stream.
 24. Thedevice according to claim 23, wherein said signaling information allowsdistinguishing between at least two of the following types of compositeframes: composite frame of the tile-format type without depth maps;composite frame of the tile-format type with one depth map; compositeframe of the tile-format type with two depth maps in side-by-sideconfiguration; composite frame of the tile-format type with two depthmaps in top-and-bottom configuration; composite frame of the tile-formattype with two depth maps in tile-format configuration.
 25. The deviceaccording to claim 24, comprising means which, based on said informationuseful for distinguishing a type of composite frame, are adapted tooutput: only said first image of said right and left images; or saidfirst one and said second one of said right and left images; or saidfirst one and said second one of said right and left images and said atleast one depth or disparity map.
 26. The device according to claim 23,comprising means for carrying out one or more of the followingoperations, based on said signaling information: recognizing the type offrame being received, for the purpose of properly reconstructing the tworight and left images of the three-dimensional video content;recognizing the presence of one or two depth or disparity maps and thetype of configuration thereof; in case of two depth or disparity maps,obtaining each one of the two maps; performing, on said depth ordisparity maps, operations adapted to bring the dimensions of the mapsto values equal to those of the images of the video content.
 27. Thedevice according to claim 19, comprising means so for generating furtherimages corresponding to further views by starting from said right andleft images and by using said depth maps.
 28. The device according toclaim 27, comprising means for displaying two sequences of images, atleast one of which comprising images synthesized by starting from atleast one of the transmitted views and from at least one depth map. 29.The device according to claim 28, comprising means for giving the viewerthe possibility of choosing sequences of images relating to more or lessclose viewpoints, so as to vary the perception of depth.
 30. The deviceaccording to claim 27, comprising a self-stereoscopic display, andcomprising means for utilizing said further images corresponding tofurther views in order to allow viewers positioned at different pointsin space to see different sequences of images.
 31. A stereoscopic videostream characterized by comprising at least one composite imagegenerated by means of the method according to claim 1.